Abstract: BotometerLite is advertised as a lightweight bot detector that improves scalability by focusing on only user profile information; furthermore, BotometerLite claims that using fewer features only entails a small compromise in individual accuracy. We test the validity of this claim by comparing Botometer with BotometerLite bot likelihood scores for 75,000 users across 5 data sets. We randomly sampled 15,000 users from the following data sets: Coronavirus, 2016 election, News outlets, Charlottesville, and the Twitter API. BotometerLite scores varied drastically from Botometer scores.
Botometer is one of the most popular bot detection tools used in social science Rauchfleisch and Kaiser (2020). However, due to Botometer API rate limits, Beskow et al. (2018) recommends a tiered framework for bot detection and suggests models that focus only on user profile information can be used at scale for general estimates of bot penetration.
Yuan, Schuchard, and Crooks (2019) used DeBot for large-scale bot annotations when examining tweets related to the 2015 California Disneyland measles outbreak. Whereas, Broniatowski, Hilyard, and Dredze (2016) used Botometer for small-scale bot annotations.
Dunn et al. (2020) annotated bots based on Botometer scores of 0.5 of greater when assessing the limited role of bots in spreading vaccine-critical information. Botometer’s FAQ page explicitly states “It’s tempting to set some arbitrary threshold score and consider everything above that number a bot and everything below a human, but we do not recommend this approach. Binary classification of accounts using two classes is problematic because few accounts are completely automated”. Instead, Botometer recommends setting a threshold on the CAP score. Dunn et al. (2020) acknowledges the imprecision inherent in bot detection and states the conclusions of this study are robust to differences related to imprecision of the bot proportion estimate.
Botometer was initially launched in May 2014 and BotometerLite was released in September 2020. BotometerLite improves scalability by focusing on only user profile information; furthermore, BotometerLite claims that using fewer features only entails a small compromise in individual accuracy Yang et al. (2020). The training and performance evaluation of BotometerLite is described in “Scalable and Generalizable Social Bot Detection through Data Selection” Yang et al. (2020).
Rauchfleisch and Kaiser (2020) found Botometer scores are imprecise at estimating bots, especially in a different language, and prone to variance over time a high number of human users as bots and vice versa.
Many researchers annotate bots based on Botometer score thresholds, in line with the precedent established in previous literature (add citations). Understanding how BotometerLite performs in comparison to Botometer is critical to prevent people from thinking BotometerLite can be used as a scalable substitute for Botometer.
In this study, we seek to answer the following questions:
The Botometer FAQ section assigns bot scores based on the following categories:
Complete Automation Probability is defined as the probability, according to our models, that an account with this score or greater is a bot.
The Botometer website uses the CAP to express the percentage of accounts with bot score above a given account that are labeled as humans. Think of this as the chances that you would wrongly classify a human as a bot if you used this account’s score as a threshold. You would want this probability to be pretty small, say less than 5%. (For the statisticians, this is a p-value.)
The following preliminary results explore the similarity between Botometer and BotometerLite scores for users from the Coronavirus and 5G data sets, as well as a random sample of tweets collected on 1 November 2020.
Number of accounts with raw scores greater than \(k = 0.75\)
| Category | Coronavirus | Election 2016 | News Outlets | Charlottesville | 5G | Random |
|---|---|---|---|---|---|---|
| Sample Size | 10000 | - | - | - | 8677 | 3241 |
| Astroturf | 1443 (0.14) | - | - | - | 512 (0.06) | 222 (0.07) |
| Fake Follower | 904 (0.09) | - | - | - | 746 (0.09) | 553 (0.17) |
| Spammer | 134 (0.01) | - | - | - | 216 (0.02) | 83 (0.03) |
| Financial | 86 (0.01) | - | - | - | 65 (0.01) | 25 (0.01) |
| Self Declared | 274 (0.03) | - | - | - | 451 (0.05) | 290 (0.09) |
| Other | 4763 (0.48) | - | - | - | 1705 (0.2) | 1755 (0.54) |
| Overall | 3577 (0.36) | - | - | - | 1360 (0.16) | 1246 (0.38) |
| CAP | 8090 (0.81) | - | - | - | 4617 (0.53) | 2693 (0.83) |
| BotometerLite | 550 (0.06) | - | - | - | 623 (0.07) | 3241 (1) |
Coronavirus doesn’t look much different than an random sample except it had relatively higher astroturf scores. Given the political nature of COVID prevention measures, this result is somewhat expected. Random had the highest fake follower, spammer, and self declared scores. 5G had relatively low scores across all categories.
The 5G data set stands out the most in terms of bot counts and score distribution shapes.
BotometerLite is most similar to the Botometer fake follower and spammer scores with \(R^2\) values of 0.313 and 0.26, respectively. Hence, if Botometer scores are accurate, BotometerLite may be somewhat effective at identifying some fake followers and spammers.
CAP is the probability that an account with this score or greater is a bot. Therefore, if we model an accounts bot status as a Poisson binomial random variable, the expected number of bots is given by:
Hence, we should expect 13524 (61.7%) of the 21918 accounts to be bots.
Expected bots by data set:
Future work for course project:
Questions:
The pearson correlation matrix (\(R^2\) values are the square of the values of this matrix) also shows the scores are weakly correlated.
Beskow, David, Kathleen M Carley, Halil Bisgin, Ayaz Hyder, Chris Dancy, and Robert Thomson. 2018. “Introducing Bothunter: A Tiered Approach to Detection and Characterizing Automated Activity on Twitter.” In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer.
Broniatowski, David A, Karen M Hilyard, and Mark Dredze. 2016. “Effective Vaccine Communication During the Disneyland Measles Outbreak.” Vaccine 34 (28). Elsevier: 3225–8.
Dunn, Adam G, Didi Surian, Jason Dalmazzo, Dana Rezazadegan, Maryke Steffens, Amalie Dyda, Julie Leask, Enrico Coiera, Aditi Dey, and Kenneth D Mandl. 2020. “Limited Role of Bots in Spreading Vaccine-Critical Information Among Active Twitter Users in the United States: 2017–2019.” American Journal of Public Health 110 (S3). American Public Health Association: S319–S325.
Rauchfleisch, Adrian, and Jonas Kaiser. 2020. “The False Positive Problem of Automatic Bot Detection in Social Science Research.” Berkman Klein Center Research Publication, nos. 2020-3.
Yang, Kai-Cheng, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. “Scalable and Generalizable Social Bot Detection Through Data Selection.” In Proceedings of the Aaai Conference on Artificial Intelligence, 34:1096–1103. 01.
Yuan, Xiaoyi, Ross J Schuchard, and Andrew T Crooks. 2019. “Examining Emergent Communities and Social Bots Within the Polarized Online Vaccination Debate in Twitter.” Social Media+ Society 5 (3). SAGE Publications Sage UK: London, England: 2056305119865465.